Generalized exponential distribution (genexpon)#

The generalized exponential distribution in SciPy (scipy.stats.genexpon) is a continuous distribution on \([0,\infty)\) with an especially simple hazard rate:

\[ h(x) = a + b\bigl(1-e^{-cx}\bigr), \qquad a,b,c>0. \]

So the instantaneous failure/event rate starts at \(h(0)=a\) and increases smoothly toward the asymptote \(h(\infty)=a+b\).


Learning goals#

  • write down the PDF/CDF/survival/hazard (including SciPy’s loc/scale form)

  • understand how \((a,b,c)\) control the hazard and the tail

  • compute moments via a convergent series and connect them to SciPy’s stats

  • sample in a NumPy-only way via inverse CDF (bisection)

  • visualize PDF/CDF and Monte Carlo samples, and fit the model in SciPy

import math

import numpy as np

import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

from scipy import stats
from scipy.special import gammaln
from scipy.stats import genexpon as genexpon_sp

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(7)

# A default parameter set used throughout
A0, B0, C0 = 1.2, 0.7, 2.0

1) Title & classification#

  • Name: genexpon (generalized exponential; SciPy’s scipy.stats.genexpon)

  • Type: continuous distribution

  • Standard support: \(x \in [0,\infty)\)

  • Parameter space (shape parameters): \(a>0\), \(b>0\), \(c>0\)

SciPy also provides a location–scale form:

  • location: \(\mathrm{loc} \in \mathbb{R}\)

  • scale: \(\mathrm{scale} > 0\)

  • support: \(x \in [\mathrm{loc},\infty)\)

with the standardization \(z=(x-\mathrm{loc})/\mathrm{scale}\).

2) Intuition & motivation#

Hazard-rate view (why genexpon is nice)#

For a nonnegative continuous random variable \(X\) with PDF \(f\) and CDF \(F\), the hazard rate is

\[ h(x) = \frac{f(x)}{1-F(x)}. \]

For genexpon, the hazard has the simple form

\[ h(x) = a + b(1-e^{-cx}) = (a+b) - b e^{-cx}. \]

Interpretation:

  • \(a\) is the baseline event rate at time 0.

  • \(b\) is an additional rate that “turns on” over time.

  • \(c\) controls how quickly the hazard transitions from \(a\) to \(a+b\).

So genexpon is a flexible replacement for an exponential waiting time when the “memoryless / constant hazard” assumption is too rigid.

Typical real-world use cases#

  • Reliability / survival analysis: time-to-failure with an increasing hazard that saturates (risk increases with age but levels off).

  • Event-time modeling: time-to-churn, time-to-conversion, time-to-default when risk ramps up then stabilizes.

  • Simulation: bounded, increasing intensity models in renewal/queueing-style simulations.

Relations to other distributions#

  • Exponential: if \(b=0\), then \(h(x)=a\) and \(X\sim\mathrm{Exponential}(\text{rate}=a)\).

  • Exponential with higher rate: as \(c\to\infty\), \(h(x)\) transitions rapidly to \(a+b\), and the distribution approaches \(\mathrm{Exponential}(\text{rate}=a+b)\).

  • Weibull-like near 0: for small \(c\), \(1-e^{-cx}\approx cx\), so \(h(x)\approx a + (bc)x\) (a “linear hazard” shape, like Weibull with shape near 2 when \(a\) is small).

3) Formal definition#

Let \(a,b,c>0\). The standardized genexpon distribution has support \(x\ge 0\) and PDF

\[ f(x; a,b,c) = \Bigl(a + b\bigl(1-e^{-cx}\bigr)\Bigr) \exp\Bigl(-(a+b)x + \frac{b}{c}\bigl(1-e^{-cx}\bigr)\Bigr), \qquad x\ge 0. \]

Define the survival function \(S(x)=1-F(x)\). A key identity is that genexpon has an explicit survival function:

\[ S(x; a,b,c) = \exp\Bigl(-(a+b)x + \frac{b}{c}\bigl(1-e^{-cx}\bigr)\Bigr). \]

So the CDF is

\[ F(x; a,b,c) = 1 - S(x; a,b,c). \]

The hazard rate becomes

\[ h(x) = \frac{f(x)}{S(x)} = a + b(1-e^{-cx}). \]

SciPy location–scale parameterization#

SciPy uses the standard distribution plus loc and scale:

\[ X = \mathrm{loc} + \mathrm{scale}\,Y,\qquad Y\sim\text{standard genexpon}(a,b,c). \]

With \(z=(x-\mathrm{loc})/\mathrm{scale}\),

\[ f_X(x) = \frac{1}{\mathrm{scale}} f_Y(z),\qquad F_X(x)=F_Y(z). \]

In particular, the support is \(x\ge\mathrm{loc}\).

def _validate_genexpon_params(a, b, c, scale):
    if not (a > 0 and b > 0 and c > 0):
        raise ValueError("a, b, c must be > 0")
    if not (scale > 0):
        raise ValueError("scale must be > 0")


def genexpon_logsf(x, a, b, c, loc=0.0, scale=1.0):
    '''Log survival log P(X > x) for genexpon in SciPy's loc/scale form (NumPy-only).'''
    _validate_genexpon_params(a, b, c, scale)
    x = np.asarray(x, dtype=float)

    z = (x - loc) / scale
    out = np.zeros_like(z, dtype=float)  # logsf = 0 for z < 0 (sf = 1)

    mask = z >= 0
    zz = z[mask]

    one_minus_exp = -np.expm1(-c * zz)  # 1 - exp(-c z)
    out[mask] = (-a - b) * zz + (b / c) * one_minus_exp
    return out


def genexpon_sf(x, a, b, c, loc=0.0, scale=1.0):
    '''Survival function P(X > x) (NumPy-only).'''
    return np.exp(genexpon_logsf(x, a, b, c, loc=loc, scale=scale))


def genexpon_cdf(x, a, b, c, loc=0.0, scale=1.0):
    '''CDF P(X <= x) computed stably as 1 - exp(logsf) (NumPy-only).'''
    logsf = genexpon_logsf(x, a, b, c, loc=loc, scale=scale)

    x = np.asarray(x, dtype=float)
    z = (x - loc) / scale

    out = np.zeros_like(z, dtype=float)
    mask = z >= 0
    out[mask] = -np.expm1(logsf[mask])
    return out


def genexpon_logpdf(x, a, b, c, loc=0.0, scale=1.0):
    '''Log PDF in SciPy's loc/scale form (NumPy-only).'''
    _validate_genexpon_params(a, b, c, scale)
    x = np.asarray(x, dtype=float)

    z = (x - loc) / scale
    out = np.full_like(z, -np.inf, dtype=float)

    mask = z >= 0
    zz = z[mask]

    one_minus_exp = -np.expm1(-c * zz)
    hazard = a + b * one_minus_exp

    out[mask] = np.log(hazard) + (-a - b) * zz + (b / c) * one_minus_exp - np.log(scale)
    return out


def genexpon_pdf(x, a, b, c, loc=0.0, scale=1.0):
    '''PDF in SciPy's loc/scale form (NumPy-only).'''
    return np.exp(genexpon_logpdf(x, a, b, c, loc=loc, scale=scale))


def genexpon_hazard(x, a, b, c, loc=0.0, scale=1.0):
    '''Hazard rate h(x) = f(x) / (1 - F(x)) in SciPy's loc/scale form (NumPy-only).'''
    _validate_genexpon_params(a, b, c, scale)
    x = np.asarray(x, dtype=float)

    z = (x - loc) / scale
    out = np.zeros_like(z, dtype=float)

    mask = z >= 0
    zz = z[mask]

    one_minus_exp = -np.expm1(-c * zz)
    # hazard scales as 1/scale under x = loc + scale * z
    out[mask] = (a + b * one_minus_exp) / scale
    return out

4) Moments & properties#

Tail behavior and existence of moments#

Because the hazard converges to the constant \(a+b\), the tail is exponentially bounded:

\[ S(x) = \exp\Bigl(-(a+b)x + \tfrac{b}{c}(1-e^{-cx})\Bigr) = e^{b/c}\,e^{-(a+b)x}\,\exp\bigl(-(b/c)e^{-cx}\bigr) \sim e^{b/c}\,e^{-(a+b)x}\quad (x\to\infty). \]

So all polynomial moments exist, and the MGF exists at least for \(t<a+b\).

Raw moments (integer orders) via a convergent series#

For integer \(n\ge 1\) one convenient expression is

\[ \mathbb{E}[X^n] = n!\,e^{b/c}\sum_{k=0}^\infty \frac{(-b/c)^k}{k!\,(a+b+ck)^n}. \]

In particular:

  • mean: \(\mathbb{E}[X]=1!\,e^{b/c}\sum_{k\ge 0}\dfrac{(-b/c)^k}{k!\,(a+b+ck)}\)

  • second raw moment: \(\mathbb{E}[X^2]=2!\,e^{b/c}\sum_{k\ge 0}\dfrac{(-b/c)^k}{k!\,(a+b+ck)^2}\)

  • variance: \(\mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2\)

Skewness and kurtosis can be computed from the first four raw moments.

MGF / characteristic function#

Using the same expansion (see Section 6), for \(t<a+b\):

\[ M(t)=\mathbb{E}[e^{tX}] =1 + t\,e^{b/c}\sum_{k=0}^\infty \frac{(-b/c)^k}{k!\,(a+b+ck-t)}. \]

The characteristic function is \(\varphi(\omega)=M(i\omega)\).

Entropy#

The (differential) entropy is

\[ H(X) = -\mathbb{E}[\log f(X)] = -\int_0^\infty f(x)\log f(x)\,dx. \]

A simple closed form is not commonly used; in practice it is computed numerically (SciPy provides genexpon.entropy).

def genexpon_raw_moment_series(n, a, b, c, terms=5000):
    '''Raw moment E[X^n] for the *standard* genexpon(a,b,c) via a convergent series.'''
    if n < 1 or int(n) != n:
        raise ValueError("n must be a positive integer")
    n = int(n)

    k = np.arange(terms)
    lam = a + b + c * k

    # term_k = (-b/c)^k / (k! * lam^n)
    # Use log-space for numerical stability.
    sign = np.where(k % 2 == 0, 1.0, -1.0)
    log_abs = k * np.log(b / c) - gammaln(k + 1) - n * np.log(lam)
    series = np.sum(sign * np.exp(log_abs))

    return np.exp(b / c) * math.factorial(n) * series


def genexpon_mgf_series(t, a, b, c, terms=5000):
    '''MGF M(t) for t < a+b (standard genexpon), via a series.'''
    s = a + b
    if not (t < s):
        raise ValueError("MGF diverges for t >= a+b")

    k = np.arange(terms)
    denom = (a + b + c * k) - t

    sign = np.where(k % 2 == 0, 1.0, -1.0)
    log_abs = k * np.log(b / c) - gammaln(k + 1) - np.log(denom)
    series = np.sum(sign * np.exp(log_abs))

    return 1.0 + t * np.exp(b / c) * series


# Compare series moments to SciPy's numerical stats
params = (A0, B0, C0)
mean_sp, var_sp, skew_sp, kurt_excess_sp = genexpon_sp.stats(*params, moments="mvsk")
entropy_sp = genexpon_sp.entropy(*params)

m1 = genexpon_raw_moment_series(1, *params)
m2 = genexpon_raw_moment_series(2, *params)
m3 = genexpon_raw_moment_series(3, *params)
m4 = genexpon_raw_moment_series(4, *params)

mean_series = m1
var_series = m2 - m1**2
mu3 = m3 - 3 * m1 * m2 + 2 * m1**3
mu4 = m4 - 4 * m1 * m3 + 6 * m1**2 * m2 - 3 * m1**4

skew_series = mu3 / var_series ** 1.5
kurt_excess_series = mu4 / var_series**2 - 3

print("params (a,b,c):", params)
print("mean  (series):", mean_series)
print("mean  (SciPy) :", float(mean_sp))
print("var   (series):", var_series)
print("var   (SciPy) :", float(var_sp))
print("skew  (series):", skew_series)
print("skew  (SciPy) :", float(skew_sp))
print("kurtosis excess (series):", kurt_excess_series)
print("kurtosis excess (SciPy) :", float(kurt_excess_sp))
print("entropy (SciPy):", float(entropy_sp))

# MGF sanity check against Monte Carlo for a t < a+b

t = 0.4 * (A0 + B0)
mgf_series = genexpon_mgf_series(t, *params)

mc = genexpon_sp.rvs(*params, size=60_000, random_state=rng)
mgf_mc = np.mean(np.exp(t * mc))

print("\nMGF check at t=", t)
print("M(t) series:", mgf_series)
print("M(t) MC    :", mgf_mc)
params (a,b,c): (1.2, 0.7, 2.0)
mean  (series): 0.6330583443564004
mean  (SciPy) : 0.6330583443564002
var   (series): 0.32480146094947066
var   (SciPy) : 0.32480146094957457
skew  (series): 1.7447600281534774
skew  (SciPy) : 1.7447600281939166
kurtosis excess (series): 4.621541991785324
kurtosis excess (SciPy) : 4.621541990075665
entropy (SciPy): 0.5344790035243877

MGF check at t= 0.76
M(t) series: 1.8376730717305914
M(t) MC    : 1.8326085115945967

5) Parameter interpretation#

A useful way to interpret \((a,b,c)\) is through the hazard

\[ h(x)=a + b(1-e^{-cx}). \]
  • \(a\) (baseline hazard): \(h(0)=a\). Larger \(a\) pushes mass toward 0 (shorter waiting times).

  • \(b\) (extra hazard that turns on): sets the asymptotic increase. As \(x\to\infty\), \(h(x)\to a+b\).

  • \(c\) (transition speed): how quickly the hazard rises from \(a\) to \(a+b\).

    • large \(c\) → fast transition (closer to an exponential with rate \(a+b\))

    • small \(c\) → slow transition (hazard stays near \(a\) for longer)

Below we visualize how the hazard and PDF change when we vary \(c\).

# Grids for visualization
x_max = float(genexpon_sp.ppf(0.995, A0, B0, C0))
x = np.linspace(0, x_max, 800)

fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=["Hazard h(x)", "PDF f(x)"],
)

# Vary c (transition speed)
for c in [0.5, 1.0, 2.0, 6.0]:
    fig.add_trace(
        go.Scatter(x=x, y=genexpon_hazard(x, A0, B0, c), mode="lines", name=f"c={c}"),
        row=1,
        col=1,
    )
    fig.add_trace(
        go.Scatter(x=x, y=genexpon_pdf(x, A0, B0, c), mode="lines", name=f"c={c}"),
        row=1,
        col=2,
    )

fig.update_layout(width=980, height=420, title="Effect of c (a and b fixed)")
fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_yaxes(title_text="hazard", row=1, col=1)
fig.update_yaxes(title_text="density", row=1, col=2)
fig.show()

6) Derivations#

6.1 From hazard to survival/CDF/PDF#

For a nonnegative lifetime \(X\) with hazard \(h(x)\), the survival function satisfies

\[ S(x)=\exp\Bigl(-\int_0^x h(t)\,dt\Bigr). \]

For genexpon, plug in \(h(t)=a + b(1-e^{-ct})\):

\[ \int_0^x h(t)\,dt = \int_0^x \bigl(a+b-b e^{-ct}\bigr)\,dt = (a+b)x + \frac{b}{c}(e^{-cx}-1). \]

Therefore

\[ S(x)=\exp\Bigl(-(a+b)x + \frac{b}{c}(1-e^{-cx})\Bigr), \qquad F(x)=1-S(x), \]

and the PDF is \(f(x)=h(x)S(x)\).

6.2 Expectation and variance#

For any nonnegative \(X\),

\[ \mathbb{E}[X] = \int_0^\infty S(x)\,dx, \qquad \mathbb{E}[X^n] = n\int_0^\infty x^{n-1} S(x)\,dx\ \ (n\ge 1). \]

Write the survival as

\[ S(x)=e^{b/c}\,e^{-(a+b)x}\,\exp\bigl(-(b/c)e^{-cx}\bigr) = e^{b/c}\,e^{-(a+b)x}\sum_{k=0}^\infty \frac{(-b/c)^k}{k!} e^{-ckx}. \]

Insert into the moment integral and integrate term-by-term:

\[ \int_0^\infty x^{n-1} e^{-(a+b+ck)x}\,dx = \frac{\Gamma(n)}{(a+b+ck)^n} = \frac{(n-1)!}{(a+b+ck)^n}. \]

This yields the raw-moment series used earlier:

\[ \mathbb{E}[X^n]=n!\,e^{b/c}\sum_{k=0}^\infty \frac{(-b/c)^k}{k!\,(a+b+ck)^n}. \]

Then

\[ \mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2. \]

6.3 Inverse CDF (Lambert W)#

Setting \(F(x)=p\) and solving for \(x\) gives a closed form using the Lambert W function. Define \(s=a+b\) and

\[ t = \frac{b - c\log(1-p)}{s}. \]

Then

\[ \mathrm{ppf}(p) = \frac{t + W\!\left(-\frac{b}{s}e^{-t}\right)}{c}, \qquad 0<p<1, \]

where \(W\) is the principal real branch of Lambert W. SciPy’s implementation uses exactly this formula (scipy.special.lambertw).

6.4 Likelihood (i.i.d. sample)#

For observations \(x_1,\dots,x_n\) with \(x_i\ge 0\), the log-likelihood is

\[ \ell(a,b,c) = \sum_{i=1}^n \log\bigl(a + b(1-e^{-c x_i})\bigr) + \sum_{i=1}^n\Bigl(-(a+b)x_i + \frac{b}{c}(1-e^{-c x_i})\Bigr). \]

To fit parameters numerically, it’s common to optimize over unconstrained variables, e.g. \((\alpha,\beta,\gamma)=(\log a,\log b,\log c)\). SciPy’s genexpon.fit performs MLE under the hood.

# Numerical sanity checks for E[X] and Var(X) via survival integrals on a grid

x_hi = float(genexpon_sp.ppf(0.9999, A0, B0, C0))
x_grid = np.linspace(0, x_hi, 80_000)

sf_grid = genexpon_sf(x_grid, A0, B0, C0)

# E[X] = ∫_0^∞ S(x) dx
mean_num = np.trapz(sf_grid, x_grid)

# E[X^2] = 2 ∫_0^∞ x S(x) dx
ex2_num = 2 * np.trapz(x_grid * sf_grid, x_grid)
var_num = ex2_num - mean_num**2

print("mean (series):", mean_series)
print("mean (grid)  :", mean_num)
print("var  (series):", var_series)
print("var  (grid)  :", var_num)
mean (series): 0.6330583443564004
mean (grid)  : 0.6330057127705082
var  (series): 0.32480146094947066
var  (grid)  : 0.32428303098588623

7) Sampling & simulation (NumPy only)#

Inverse transform sampling#

To sample from a continuous distribution, a standard method is:

  1. Draw \(U\sim\mathrm{Uniform}(0,1)\).

  2. Return \(X = F^{-1}(U)\).

For genexpon, \(F\) is explicit, and \(F^{-1}\) has a Lambert-W closed form.

Why use bisection here?#

If you want NumPy-only sampling (no special functions), you can still invert \(F\) numerically. Because \(F\) is monotone increasing, bisection is robust:

  • Maintain an interval \([\ell, u]\) such that \(F(\ell)\le p \le F(u)\).

  • Repeatedly cut the interval in half until the desired tolerance.

A convenient guaranteed upper bound uses the fact that \(h(x)\ge a\) for all \(x\ge 0\), so \(S(x)\le e^{-ax}\) and therefore

\[ F(x) \ge 1-e^{-ax}. \]

So choosing

\[ u = -\frac{\log(1-p)}{a} \]

guarantees \(F(u)\ge p\).

def genexpon_ppf_bisect(p, a, b, c, loc=0.0, scale=1.0, max_iter=70):
    '''Percent point function (inverse CDF) via bisection (NumPy-only).'''
    _validate_genexpon_params(a, b, c, scale)

    p = np.asarray(p, dtype=float)

    # Handle edge cases
    out = np.empty_like(p)
    out[p <= 0] = loc
    out[p >= 1] = np.inf

    mask = (p > 0) & (p < 1)
    if not np.any(mask):
        return out

    pp = p[mask]

    # Work in standardized coordinates z >= 0
    lo = np.zeros_like(pp)
    hi = -np.log1p(-pp) / a  # guaranteed bracket due to S(x) <= exp(-a x)

    for _ in range(max_iter):
        mid = 0.5 * (lo + hi)
        cdf_mid = genexpon_cdf(mid, a, b, c)
        left = cdf_mid < pp
        lo[left] = mid[left]
        hi[~left] = mid[~left]

    out[mask] = loc + scale * hi
    return out


def sample_genexpon(size, a, b, c, loc=0.0, scale=1.0, rng=None):
    '''Sample from genexpon(a,b,c,loc,scale) using NumPy-only inverse-CDF sampling.'''
    if rng is None:
        rng = np.random.default_rng()
    u = rng.uniform(0.0, 1.0, size=size)
    return genexpon_ppf_bisect(u, a, b, c, loc=loc, scale=scale)


n = 20_000
samples = sample_genexpon(n, A0, B0, C0, rng=rng)

print("sample mean:", samples.mean())
print("sample var :", samples.var())
print("theory mean (series):", mean_series)
print("theory var  (series):", var_series)
sample mean: 0.630914735153167
sample var : 0.3263278688475253
theory mean (series): 0.6330583443564004
theory var  (series): 0.32480146094947066

8) Visualization#

Below are:

  • the analytic PDF and CDF

  • a Monte Carlo histogram overlaid with the PDF

x_max = float(genexpon_ppf_bisect(0.999, A0, B0, C0))
x_grid = np.linspace(0, x_max, 800)

pdf_grid = genexpon_pdf(x_grid, A0, B0, C0)
cdf_grid = genexpon_cdf(x_grid, A0, B0, C0)

fig = make_subplots(
    rows=1,
    cols=3,
    subplot_titles=["PDF", "CDF", "Samples (hist) + PDF"],
)

fig.add_trace(go.Scatter(x=x_grid, y=pdf_grid, mode="lines", name="pdf"), row=1, col=1)
fig.add_trace(go.Scatter(x=x_grid, y=cdf_grid, mode="lines", name="cdf"), row=1, col=2)

fig.add_trace(
    go.Histogram(
        x=samples,
        nbinsx=70,
        histnorm="probability density",
        name="samples",
        opacity=0.6,
    ),
    row=1,
    col=3,
)
fig.add_trace(go.Scatter(x=x_grid, y=pdf_grid, mode="lines", name="pdf"), row=1, col=3)

fig.update_layout(width=1100, height=420, title=f"genexpon PDF/CDF + samples (a={A0}, b={B0}, c={C0})")
fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_xaxes(title_text="x", row=1, col=3)
fig.update_yaxes(title_text="density", row=1, col=1)
fig.update_yaxes(title_text="probability", row=1, col=2)
fig.update_yaxes(title_text="density", row=1, col=3)
fig.show()

9) SciPy integration (scipy.stats.genexpon)#

SciPy provides scipy.stats.genexpon with shape parameters (a, b, c) plus loc and scale.

Common methods:

  • genexpon_sp.pdf(x, a, b, c, loc=..., scale=...)

  • genexpon_sp.cdf(x, a, b, c, loc=..., scale=...)

  • genexpon_sp.ppf(q, a, b, c, loc=..., scale=...) (uses a Lambert-W closed form)

  • genexpon_sp.rvs(a, b, c, loc=..., scale=..., size=..., random_state=...)

  • genexpon_sp.fit(data, ...) → MLE for (a, b, c, loc, scale) (you can fix loc/scale with floc/fscale)

Tip: for extreme tails, logpdf, logcdf, logsf are usually more stable than pdf, cdf, sf.

# Match our NumPy-only PDF/CDF to SciPy on a grid
x = np.linspace(0, float(genexpon_sp.ppf(0.95, A0, B0, C0)), 25)

pdf_diff = np.max(np.abs(genexpon_pdf(x, A0, B0, C0) - genexpon_sp.pdf(x, A0, B0, C0)))
cdf_diff = np.max(np.abs(genexpon_cdf(x, A0, B0, C0) - genexpon_sp.cdf(x, A0, B0, C0)))

print("max |pdf - scipy|:", pdf_diff)
print("max |cdf - scipy|:", cdf_diff)

# Demonstrate rvs + fit
loc_true, scale_true = 0.8, 1.3

data = genexpon_sp.rvs(A0, B0, C0, loc=loc_true, scale=scale_true, size=2500, random_state=rng)

# Fitting all parameters can be unstable; in applications you often fix loc if the support start is known.
a_hat, b_hat, c_hat, loc_hat, scale_hat = genexpon_sp.fit(data)

print("\ntrue (a,b,c,loc,scale):", (A0, B0, C0, loc_true, scale_true))
print("fit  (a,b,c,loc,scale):", (a_hat, b_hat, c_hat, loc_hat, scale_hat))

# Visualize fitted vs true PDF
x_grid = np.linspace(loc_true, np.quantile(data, 0.995), 600)

fig = go.Figure()
fig.add_trace(
    go.Histogram(x=data, nbinsx=60, histnorm="probability density", name="data", opacity=0.45)
)
fig.add_trace(
    go.Scatter(
        x=x_grid,
        y=genexpon_sp.pdf(x_grid, A0, B0, C0, loc=loc_true, scale=scale_true),
        mode="lines",
        name="true pdf",
    )
)
fig.add_trace(
    go.Scatter(
        x=x_grid,
        y=genexpon_sp.pdf(x_grid, a_hat, b_hat, c_hat, loc=loc_hat, scale=scale_hat),
        mode="lines",
        name="fitted pdf",
    )
)

fig.update_layout(width=950, height=420, title="SciPy fit: true vs fitted PDF")
fig.update_xaxes(title_text="x")
fig.update_yaxes(title_text="density")
fig.show()
max |pdf - scipy|: 1.1102230246251565e-16
max |cdf - scipy|: 5.551115123125783e-17
true (a,b,c,loc,scale): (1.2, 0.7, 2.0, 0.8, 1.3)
fit  (a,b,c,loc,scale): (0.7730376604261671, 0.5652789598973224, 1.2253635858863332, 0.8000577257220729, 0.8518451733244654)

10) Statistical use cases#

10.1 Hypothesis testing / goodness-of-fit#

A common workflow:

  1. Fit parameters (MLE).

  2. Check fit via the probability integral transform (PIT): $\(u_i = F(x_i;\hat\theta).\)\( If the model is correct, the \)u_i\( should look approximately \)\mathrm{Uniform}(0,1)$.

  3. Use a uniformity test (e.g. KS test) with care: when parameters are estimated, classical p-values are not exact; use a parametric bootstrap if you need calibrated p-values.

10.2 Bayesian modeling#

In Bayesian survival modeling, you can place priors on positive parameters, e.g.

  • \(\log a,\log b,\log c \sim \mathcal{N}(0,\sigma^2)\) (log-normal priors for \(a,b,c\))

and use the genexpon log-likelihood.

10.3 Generative modeling#

Because sampling is straightforward (either via Lambert W or numerical inversion), genexpon is useful as a generative building block for waiting times in simulations:

  • simulate interarrival times

  • build renewal processes

  • stress-test systems with increasing-but-bounded hazard

# A simple goodness-of-fit workflow with PIT + a KS test

n = 1200
x = genexpon_sp.rvs(A0, B0, C0, size=n, random_state=rng)

# Fit the *standard* model (fix loc=0, scale=1) to focus on (a,b,c)
a_hat, b_hat, c_hat, _, _ = genexpon_sp.fit(x, floc=0, fscale=1)

u = genexpon_sp.cdf(x, a_hat, b_hat, c_hat, loc=0, scale=1)
ks = stats.kstest(u, "uniform")

print("fit (a,b,c):", (a_hat, b_hat, c_hat))
print("KS statistic:", ks.statistic)
print("KS p-value  :", ks.pvalue)
print("(p-value is not calibrated when parameters are estimated; bootstrap if needed)")

# PIT histogram
fig = go.Figure()
fig.add_trace(go.Histogram(x=u, nbinsx=40, name="PIT u_i", opacity=0.75))
fig.update_layout(width=850, height=350, title="Probability integral transform (should look Uniform(0,1))")
fig.update_xaxes(title_text="u")
fig.update_yaxes(title_text="count")
fig.show()

# Model comparison example: exponential vs genexpon (AIC)
loc_e, scale_e = stats.expon.fit(x, floc=0)

ll_exp = np.sum(stats.expon.logpdf(x, loc=0, scale=scale_e))
ll_gen = np.sum(genexpon_sp.logpdf(x, a_hat, b_hat, c_hat, loc=0, scale=1))

k_exp = 1
k_gen = 3

print("\nAIC (lower is better)")
print("AIC exponential:", 2 * k_exp - 2 * ll_exp)
print("AIC genexpon   :", 2 * k_gen - 2 * ll_gen)
fit (a,b,c): (1.1737074881086293, 0.8111183222726284, 1.4769073218547977)
KS statistic: 0.019322688734407012
KS p-value  : 0.7538128001659254
(p-value is not calibrated when parameters are estimated; bootstrap if needed)
AIC (lower is better)
AIC exponential: 1345.9993175118516
AIC genexpon   : 1325.120700479394

11) Pitfalls#

  • Invalid parameters: SciPy requires \(a,b,c>0\) (and scale>0). If your model suggests \(b=0\) (pure exponential), fit an exponential directly.

  • Identifiability / flat likelihood: \((b,c)\) can be weakly identified when \(c\) is extremely small or extremely large (different combinations can yield very similar hazards).

  • Fitting loc/scale: if you know the support starts at 0, fix floc=0 (and often fscale=1) to reduce instability.

  • Numerical underflow in tails: pdf(x) may underflow to 0 for large \(x\); prefer logpdf, logsf for likelihood work.

  • Sampling via numeric inversion: bisection is robust but can be slow for huge sample sizes; SciPy’s ppf uses Lambert W and is faster.

12) Summary#

  • genexpon is a continuous distribution on \([0,\infty)\) with hazard \(h(x)=a + b(1-e^{-cx})\) that increases from \(a\) to \(a+b\).

  • It has an explicit survival function \(S(x)=\exp(-(a+b)x + (b/c)(1-e^{-cx}))\), so \(F(x)=1-S(x)\).

  • All moments exist; integer raw moments have a convenient convergent series: $\(\mathbb{E}[X^n]=n!e^{b/c}\sum_{k\ge 0}\frac{(-b/c)^k}{k!(a+b+ck)^n}.\)$

  • Inverse CDF has a closed form using Lambert W (SciPy uses it); NumPy-only sampling can be done by monotone bisection.

  • In practice, scipy.stats.genexpon provides pdf/cdf/ppf/rvs/fit and is a handy alternative to an exponential model when hazard is increasing but bounded.